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Audio signal generation 



The invention relates to generating an output audio signal based on an inpttt 
audio signal, and in particular to an apparatus for supplying an output audio signal. 

Erik Schuijers, Werner Oomen, Bert den Blinker and Jeroen Breebaart, 
5 "AdvancesinParametrioCod^ngformgn-Qu^ty Audio", Preprint 5852, 114thAES 

Convention, Amsterdam, The Netherlands, 22-25 March 2003 disclose a parametric coding 
scheme using an efficient parametric representation for the stereo image. Two input signals 
are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly 
modeled. The merged signal is encoded using a mono parametric encoder. The stereo 
10 parameters mterchannei Intensity Difference (HD), the Interchannel Time Difference (LTD) 
and the Interchannel Cross-Correlation (ICC) are quantized, encoded and multiplexed into a 
bitstream together with the quantized and encoded mono audio signal. At the decoder side the 
bfertream is de-multiplexed to an encoded mono signal and the stereo parameters. The 
encoded mono audio signal is decoded in order to obtain a decoded mono audio signal m' 
15 (see Fig. 1). From the mono time domain signal, a de-correlated signal is calculated using a 
filter D 10 yielding optimum perceptual de-correlation. Both the mono time domain signal m 1 
and the de-correlated signal d are transformed to the frequency domain. Then the frequency 
domain stereo signal is processed with the UD, 1TD and ICC parameters by scaling, phase 
modifications and mixing, respectively, in a parameter processing unit 11 in order to obtain 
20 me decoded stereo pair V and r\ The resulting frequency domain representations are 
transformed back into the time domain. 

IntheMPEG-4 (ISO/EEC 14496-3 :2002) Proposed Draft Amendment 
(PDAM) 2, Section 5.4.6, such a de-correlated signal is obtained by convolutmg^termg toe 
mono-signal with a pre-defined impulse response. 
2S Kon pre-published European patent application 02077863.5 (Attorney docket 

PHNL020639) describes the use of an all-pass filter, e.g. a comb filter, comprising a 
frequency dependent delay to derive such a de-correlated signal. At high frequencies, a 
relatively small delay is used, resulting in a coarse frequency resolution. At low frequencies, 
a large delay results in a dense spacing of the comb filter. The filtering may be combined 
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with a band-limiting filter, thereby applying the de-correlation to one or more frequency 
bands* 

An object o f the invention is to advant ageously gene rate an output audio 

5 signal on the basis of an input audio signal To this end, the invention provides a device, a 
method and an apparatus as defined in the independent claims. Advantageous embodiments 
are defined in the dependent claims. 

According to a first aspect of the invention, an output audio signal is generated 
based on an input audio signal, the input audio signal comprising a plurality of input subband 

10 signals, wherein at least part of the input subband signals is delayed to obtain a plurality of 
delayed subband signals, wherein at least one input subband signal is delayed more than a 
further input subband signal of higher frequency, and wherein the output audio signal is 
derived from a combination of the input audio signal and the plurality of delayed subband 
signals. By providing such a frequency dependent delay in the subband domain, parametric 

15 stereo can advantageously be implemented especially in those audio decoders where the core 
decoder already includes a subband filter bank, Filter banks are commonly used in the 
context of audio coding, e.g. MPEG- 1/2 Layer I, n and m all make use of a 32 bands 
critically sampled subband filter. The plurality of delayed subband signals may be used as a 
subband domain equivalent of the de-correlated signal as described above. In ideal 

20 circumstances the correlation between the plurality of delayed subband signals and the input 
audio signal is zero. However, in practical embodiments, the correlation may be up to 40% 
for acceptable audio quality, up to 1 0% for medium to high quality audio and up to a 2 or 3 
% for high audio quality. 

In an embodiment of the invention the output audio signal includes a plurality 

25 of output subband signals. Combining the delayed subband signals and the input subband 
signals in subband domain in order to obtain the plurality of output subband signals is then 
relatively easy to implement In practical embodiments, a time domain output audio signal is 
synthesized from the plurality of output subband signals in a synthesis subband filter bank. 

In order to obtain an efficient implementation a plurality of delay units is 

30 provided, wherein the number of delay units is smaller than the number of input subband 
signals, and wherein the input subband signals are subdivided in groups over the plurality of 
delays. 

Best audio quality is obtained in embodiments where the delays in the 
plurality of delay units are monotonically increasing from high frequency to low frequency. 
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la an advantageous embodiment of the Invention, a complex filter bank is 
used, which is effectively oversampled by a factor of two because for every real input sample 
a complex output sample is generated which consists of effectively two values: areal and a 
complex one. This eliminates the large aliasing components of which the MPEG-1 and 
5 MPEG-2 critically sampled filter bank suffers. 

hi an efficient embodiment of generating the output audio signal, a Quadrature 
Mfcror Filter ("QMF") bank is used. Such a filter bank is known per se fiom Per Ekstrand, 
"Bandwidth extension of audio signals by spectral bandrepUcation", Proc. 1st IEEE Benelux 
Workshop on Model based Processing and Coding of Audio (MPCA-2002), pp. 53^58, 
10 Leuven, Belgium, November 15. 2002. Fig. 2 shows ablock diagram of such a complex 
QMF analysis and synthesis filter bank. The analysis bank 30 divides the signal into N 
complex valued sub bands, which are down sampled internally by a factor of N. A stylized 
frequency response is shown in Fig. 3. The synthesis QMF filter bank 31 takes theN 
complex sub band signals as input and generates a real valued PCM output signal. According 
15 to an insight of the inventors, when a complex QMF filter bank is used, a de-correlated signal 
can be created which is perceptually very close to the 'ideal' rituation. For such a complex 
QMF filter bank, implementations exist which are more efficient than the convolution used in 
MPECM PDAM 2, Section 5.4.6; such a convolution is relatively expensive with respect to 
computational load and memory usage. As an additional advantage, using a complex QMF 
20 filter bank also allows for an efficient combination of parametric stereo and Spectral Band 
Replication C'SBR"). The idea behind SBR is that the higher frequencies can be 
reconstructed fiom the lower frequencies using only very little helper information. la 
practice, this reconstruction is done by means of a complex Quadrature Mirror Filter (QMF) 
bank In order to efficiently come to a de-correlated signal in the subband domain, 
25 embodiments of the invention use a frequency (or subband index) dependent delay in the 
subband domain. Because the complex QMF filter bank is not critically sampled no extra 
provisions need to be taken in order to account for aliasing. Furthermore, as the delay is 
small, the over-all RAM usage of this embodiment is low. Note mat in the SBR decoder as 
disclosed by Ekstrand, the analysis QMF bank consists of only 32 bands, while the synthesis 
30 QMF bank consists of 64 bands, as the core decoder runs at half the sampling frequency 
compared to the entire audio decoder . In the corresponding encoder however, a 64 bands 
analysis QMF bank is used to cover the whole frequency range. 

These and other aspects of the invention are apparent from and will be 
elucidated with reference to the embodiments described hereinafter. 
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In the drawings: 

Fig. 1 shows a Hook diagram, of parametric stereo decoder, 

Fig. 2 shows a "block diagram of an N bands complex QMF analysis (left) and 

5 synfieTir(n®lfntBr'baacr ' " 

Fig. 3 shows a stylized frequency response of the N bands QMF filter banks of 

Fig-2; 

Fig- 4 shows a spectrogram of an impulse response used in MPEG-4 PDAM 2, 
Section 5.4.6 to generate the de-correlated signal, wherein the x-axis denotes time (samples) 
10 and the y-axis denotes the normalized frequency; 

Fig. 5 shows a block diagram showing a device according to an embodiment 

of the invention; 

Fig, 6 shows a delay expressed in subband samples as a function of subband 
index according to an embodiment of the invention, and 
1 5 pig. 7 shows an advantageous audio decoder according to an embodiment of 

the invention, which combines parametric stereo with spectral band replication. 

The drawings only show those elements that are necessary to understand the 

invention. 

20 In me following, an advantageous embodiment of the invention is described 

for generating a stereo output audio signal based on a mono input audio signal by using 
parametric stereo. The input audio signal includes a plurality of input subband signals. The 
plurality of input subband signals are delayed in a plurality of delay units providing more 
delay for lower frequency subbands man for higher frequeny subbands. The delayed subband 

25 signals serve as a subband domain version of the de-correlated signal needed in the 
generation of the stereo output signal. 

In MPEG-4 PDAM 2, Section 5.4.6, the de-correlated signal is obtained by 
first calculating a phase characteristic ft which for a sampling frequency f, of 44.1 kHz 
equals: 

nk(k-\) , m (1) 
30 q>- — ~ — +ft> K J 

where ft, has a value of Tttl, Kis equal to 256 and *= 0...256. From mis phase response 
function a filter impulse response is tiien calculated using the inverse FFT. It resembles a 
linear delay. This delay can be approximated by: 
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where d is the delay in samples and/the frequency in radians. 

Preferably, fee input subband signals are obtained in a complex QMF analysis 
filter bank, which may be present in a remote encoder, but which may also be present in the 
5 decoder. As the outputs of a complex QMF filter bank are down sampled by a factor of N it is 
not possible to exactly map a desired time domain delay to a delay within each sub band. A 
perceptually good approximation can be obtained by using rounded versions of the delay 
function (2) as described above. As an example, the delay within each subband for N"*64 
subbands is shown in Fig. 6. For this particular implementation only 136 complex values 
10 have to be stored in order to form the de-correlated signal. Note that for the higher 

frequencies still a delay of a single sub-band sample is employed, although the delay function 
above describes a value of 0 at half the sampling frequency. The delay of a single sub-band 
sample ensures that the signal is maximally de-correlated. 

Fig. 5 shows a block diagram of a device SO according to an embodiment of 
IS the invention for generating the plurality of delayed subband signals. The device 50 Is placed 
somewhere between the QMF analysis filter bank 30 and the QMF synthesis filter bank 31 
and comprises a plurality of delay units 501, 502, 503 and 504. The delay unit 501 provides a 
one unit delay for all subbands. A group of higher frequency subbands, e.g. bands 40-64, is 
rurnished without further delay to the synthesis QMF filter bank 31. The group of relatively 
20 low frequency subbands, e.g. bands 0-40, is further delayed in delay unit 502. Part of this 
group, e.g. bands 0-24, is further delayed in delay unit 503 and delay unit 504 (the latter for 
subbands 0-8 only). So effectively an exemplary amount of 4 groups of different delay are 
created, having delays of 1 , 2, 3 or 4 unit delays respectively. The delay expressed in subband 
samples as a function of subband index is shown in Fig. 6. The QMF analysis filter bank 30 
25 is usually present in an audio encoder, although for SBR a smaller M bands analysis QMF 
filter bank is also used in the decoder. 

Fig. 7 shows an advantageous audio decoder 700 according to an embodiment 
of the invention which combines a parametric stereo tool and SBR. A bit-stream demux 70 
receives the encoded audio bitstream and derives the SBR parameters, the stereo parameters 
30 and the core encoded audio signal. The core encoded audio signal is decoded using a core 
decoder 71, which can e.g. be a standard MPEG-1 Layer HI (mp3) or an AAC decoder. 
Typically such a decoder runs at half the output sampling frequency (£/2). The resulting core 
decoded audio signal is fed to an M subbands complex QMF filter bank 72. This filter bank 
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72 outputs M complex samples pet M real input samples and is thus effectively over-sampled 
by a fector of 2, as explained before, In a High-Frequency (HF) generator 73, higher 
frequency subbands N-M, which are not covered by the core decoded audio signal, are 
generated by replicating (certain parts of) the M subbands. The output of the high-frequency 
5 generator 73 is combined with the lower M subbands into N complex sub-band signals. 
Subsequently an envelope adjuster 74 adjusts the replicated high frequency sub-band signals 
to the desired envelope and an additional component adding unit 75 adds additional 
sinusoidal and noise components as indicated by the SBR parameters. The total N subband 
signals are furnished to a delays unit 76, which may be equal to die device SO shown in Fig. 

10 5, in order to generate the delayed subband signals. The N delayed subband signals and the N 
input subband signals are processed in combining unit 77 in dependence on stereo parameters 
such as the ICC parameter so as to derive N output subband signals for a first output channel 
and N output subband signals for a second output channel. Hie N output subband signals for 
the first output channel are fed through the N bands complex QMF synthesis filter 78 to form 

IS the first PCM output signals for left L. The N output subband signals for the second output 
channel are fed through the N bands complex QMF synthesis filter 79 to form the first PCM 
output signals for right In practical embodiments, N=64 and M^Z 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 

20 embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising' does not exclude the presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 

25 enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 
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CLAIMS: 



1 . A device fbi generating an output audio signal (L, R) based on an input audio 

signal, the input audio signal comprising a plurality of input subband signals (N)* the device 
comprising; 

a, plurality of delay units (76, 501...504) for delaying at least part of the input 
5 subband signals tp obtain a plurality of delayed subband signals, wherein at least one input 
subband signal is delayed more than a ftirther input subband signal of higher frequency, and 
a combining unit (77) for deriving the output audio signal from a combination 
of the input audio signal and the plurality of delayed subband signals. 

10 2. A device as claimed in plaim 1> wherein the output audio signal includes a 

plurality of output subband signals. 

3 , A device as claimed in claim 2, the device further comprising a subband filter 

bank (78, 79) for synthesizing a time domain output audio signal (LJR) from the plurality of 
15 output subband signals. 

4 # a device as claimed in claim 1 , wherein the input audio signal is a mono audio 

signal and the output audio signal is a stereo audio signal. 

20 5. A device as claimed in claim I, wherein the number of delay units is smaller 

than the number of input subband signals, and wherein the input subband signals ate 
subdivided in groups over the plurality of delays units, 

g A device as claimed in claim 5, wherein the plurality of delay units comprises 

25 a first delay unit (501) for delaying a group of relatively high frequency subbands with on© 
subband sample, and at least one further delay unit (502. ..504) for delaying a group of 
relatively low frequency subbands with at least a further subband sample, 
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7. A device as claimed in claim 1, wherein the delay units provide delays which 
are mouotonicaDy increasing from high frequency to low frequency. 

8. A device as claim ed in clahq 1, wherein the subba nd filter bank is a complex 

S subband filter bank. 

9. A device as claimed in claim 8, wherein the complex subband filter bank is a 
complex Quadrature Mirror Filter bank. 

10 10. A device as claimed in claim 1, the device further comprising; 

an input (70) for obtaining a correlation parameter indicative of a desired 
correlation between a first channel (L) and a second channel (R) of the output audio signal 
(WR),and 

wherein the combining unit (77) is arranged for obtaining the first channel (L) 
15 and the second channel (R) by combining the input audio signal and the plurality of delayed 
subband signals in dependence on the correlation parameter. 

11. A device as claimed in claim 10, wherein the first channel (L) and the second 
channel (R) each comprise a plurality of output subband signals, and wherein the device 

20 further comprises two synthesis subband filter banks (78,79) coupled to an output of the 

combining unit (77) for generating a first time domain channel (L) and a second time domain 
channel (R) on the basis of the output subband signals respectively. 

12. A device (700) as claimed in claim 1, wherein the device (700) further 
25 comprises; 

an analysis filter bank (72) of M subbands to generate M filtered subband 
signals on the basis of a time domain core audio signal, 

a high frequency generator (73, 74) &r generating a high frequency signal 
component derived from the M filtered subband signals, the high frequency signal 
30 component having N-M subband signals, whore N>M, tile N-M subband signals including 
subband signals with a higher frequency than any of the subbands in the M subbands, the M 
filtered subbands and the N-M subbands together forming the plurality of input subband 
signals (N)» 
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13 , A method of providing an output audio signal (L, R) based on an input audio 

signal, the input audio signal comprising a plurality of input subband signals <N)> * e n&bod 
comprising: 

delaying (501...S04) at least part of the input subband signals to obtain a 
5 plurality of delayed subband signals, wherein at least one input subband signal is delayed 
more than a further input subband signal of higher frequency, and 

deriving the output audio signal from a combination of the input audio signal 
and the plurality of delayed subband signals. 

10 j4 # An apparatus (700) for supplying an output audio signal, the apparatus 

comprising: 

an input unit (70) for obtaining an encoded audio signal) 

a decoder (71) for decoding the encoded audio signal to obtain a decoded 

signal including a plurality of subband signals, 
15 a device as claimed in claim 1 for obtaining the output audio signal based on 

the decoded signal, and 

an output unit for supplying the output audio signal. 
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ABSTRACT: 



As output audio signal (L, R)is generated based on an input audio signal, the 
input audio signal comprising a plurality of input subband signals (N). The input subband 
signals are delayed in a plurality of delay units (76) to obtain a plurality of delayed subband 
signals, wherein at least one input subband signal is delayed more than a further input 
subband signal of higher frequency, and wherein the output audio signal is derived (77) from 
a combination of the input audio signal and the plurality of delayed subband signals. 
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